Engineering proteinase K using machine learning and synthetic genes

نویسندگان

  • Jun Liao
  • Manfred K Warmuth
  • Sridhar Govindarajan
  • Jon E Ness
  • Rebecca P Wang
  • Claes Gustafsson
  • Jeremy Minshull
چکیده

BACKGROUND Altering a protein's function by changing its sequence allows natural proteins to be converted into useful molecular tools. Current protein engineering methods are limited by a lack of high throughput physical or computational tests that can accurately predict protein activity under conditions relevant to its final application. Here we describe a new synthetic biology approach to protein engineering that avoids these limitations by combining high throughput gene synthesis with machine learning-based design algorithms. RESULTS We selected 24 amino acid substitutions to make in proteinase K from alignments of homologous sequences. We then designed and synthesized 59 specific proteinase K variants containing different combinations of the selected substitutions. The 59 variants were tested for their ability to hydrolyze a tetrapeptide substrate after the enzyme was first heated to 68 degrees C for 5 minutes. Sequence and activity data was analyzed using machine learning algorithms. This analysis was used to design a new set of variants predicted to have increased activity over the training set, that were then synthesized and tested. By performing two cycles of machine learning analysis and variant design we obtained 20-fold improved proteinase K variants while only testing a total of 95 variant enzymes. CONCLUSION The number of protein variants that must be tested to obtain significant functional improvements determines the type of tests that can be performed. Protein engineers wishing to modify the property of a protein to shrink tumours or catalyze chemical reactions under industrial conditions have until now been forced to accept high throughput surrogate screens to measure protein properties that they hope will correlate with the functionalities that they intend to modify. By reducing the number of variants that must be tested to fewer than 100, machine learning algorithms make it possible to use more complex and expensive tests so that only protein properties that are directly relevant to the desired application need to be measured. Protein design algorithms that only require the testing of a small number of variants represent a significant step towards a generic, resource-optimized protein engineering process.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Diagnosis of Breast Cancer Subtypes using the Selection of Effective Genes from Microarray Data

Introduction: Early diagnosis of breast cancer and the identification of effective genes are important issues in the treatment and survival of the patients. Gene expression data obtained using DNA microarray in combination with machine learning algorithms can provide new and intelligent methods for diagnosis of breast cancer. Methods: Data on the expression of 9216 genes from 84 patients across...

متن کامل

Gene Identification from Microarray Data for Diagnosis of Acute Myeloid and Lymphoblastic Leukemia Using a Sparse Gene Selection Method

Background: Microarray experiments can simultaneously determine the expression of thousands of genes. Identification of potential genes from microarray data for diagnosis of cancer is important. This study aimed to identify genes for the diagnosis of acute myeloid and lymphoblastic leukemia using a sparse feature selection method. Materials and Methods: In this descriptive study, the expressio...

متن کامل

Modeling Discharge Coefficient of Side Weir on Converging Channel Using Extreme Learning Machine

In this study, the discharge coefficient of side weirs located on converging channels was simulated for the first time using a new method of Extreme Learning Machine (ELM). To examine the accuracy of the numerical model, the Monte Carlo simulations were used and the experimental values validation was conducted by the k-fold cross validation method. Then, the input parameters were detected for s...

متن کامل

Prediction of blood cancer using leukemia gene expression data and sparsity-based gene selection methods

Background: DNA microarray is a useful technology that simultaneously assesses the expression of thousands of genes. It can be utilized for the detection of cancer types and cancer biomarkers. This study aimed to predict blood cancer using leukemia gene expression data and a robust ℓ2,p-norm sparsity-based gene selection method. Materials and Methods: In this descriptive study, the microarray ...

متن کامل

Trust Classification in Social Networks Using Combined Machine Learning Algorithms and Fuzzy Logic

Social networks have become the main infrastructure of today’s daily activities of people during the last decade. In these networks, users interact with each other, share their interests on resources and present their opinions about these resources or spread their information. Since each user has a limited knowledge of other users and most of them are anonymous, the trust factor plays an import...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • BMC Biotechnology

دوره 7  شماره 

صفحات  -

تاریخ انتشار 2007